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Pattern recognition methods have demonstrated to be suitable analyses tools to handle 
the high dimensionality of neuroimaging data. However, most studies combining neu- 
roimaging with pattern recognition methods focus on two-class classification problems, 
usually aiming to discriminate patients under a specific condition (e.g., Alzheimer's dis- 
ease) from healthy controls. In this perspective paper we highlight the potential of the 
one-class support vector machines (OC-SVM) as an unsupervised or exploratory approach 
that can be used to create normative rules in a multivariate sense. In contrast with the stan- 
dard SVM that finds an optimal boundary separating two classes (discriminating boundary), 
the OC-SVM finds the boundary enclosing a specific class (characteristic boundary). If the 
OC-SVM is trained with patterns of healthy control subjects, the distance to the bound- 
ary can be interpreted as an abnormality score. This score might allow quantification of 
symptom severity or provide insights about subgroups of patients. We provide an intu- 
itive description of basic concepts in one-class classification, the foundations of OC-SVM, 
current applications, and discuss how this tool can bring new insights to neuroimaging 
studies. 
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INTRODUCTION 

Several quantitative methods are available to analyze neuroimaging 
data. The development of voxel-based-morphometry (Ashburner 
and Friston, 2000), cortical surface modeling (Fischl et al., 1999), 
and deep -structures volumetry (Bigler et al., 1997; Appenzeller 
et al, 2005; Zetzsche et al, 2006)started a remarkable series of 
innovation. At the same time, functional magnetic resonance 
imaging (fMRI) based on BOLD signal (Ogawa et al., 1990) has 
become widely used in Neuroscience research. All these recent 
developments combined with advances in imaging acquisition 
protocols led to the accumulation of a huge amount of data. 

There have been many applications of machine learning to 
clinical problems (e.g., classifying patients vs. healthy controls; 
Kloppel et al., 2011; Mwangi et al., 2012). However an important 
aspect that has been less explored is how to define normative rules 
for neuroimaging data of a population in order to define what 
a "typical brain" is and how to measure the distance from a sin- 
gle subject to these patterns of typical brains. Pattern recognition 
methods are highly suitable for this purpose, since they were devel- 
oped to automatically discover regularities in high dimensional 
data through the use of computer algorithms (Bishop, 2006). To 
the best of our knowledge, there are no initiatives toward trying to 
define normative rules in Neuroimaging using other unsupervised 
learning methods. There is an inherent difficulty in evaluating 
results from this kind of methods (e.g., how to determine the ideal 
number or size of the clusters). Optimization is usually solved in 
a suboptimal manner using heuristic means, as an objective solu- 
tion is either unknown or unfeasible due to algorithmic complexity 



and computational effort (Moller et al., 2010). Janoos et al. (2010) 
presented preliminary results from the application of a diffusion 
distance to perform clustering on the space of fMRI volumes in 
order to identify distinct brain states. Nettiksimmonsa et al. (2010) 
were able to identify subtypes among healthy controls that might 
represent the earliest stages of subclinical cognitive decline and 
Alzheimer's disease using clustering. However, they relied on visual 
assessment for choosing the number of clusters. 

In this perspective article, we aim to discuss how the machine- 
learning framework can be applied to build normative rules from 
neuroimaging databases. We propose the one-class support vector 
machines (OC-SVM) as a suitable tool for this purpose. We also 
discuss some technological challenges and perspectives regarding 
applications of OC-SVM to neuroimaging data. 

PATTERN RECOGNITION AND NEUROIMAGING 

Most of the studies applying pattern recognition methods to 
clinical neuroimaging focus on two -class classification problems, 
usually a group of healthy/ typical subjects and a group under a 
very specific condition (e.g., major depression, Alzheimer's disease, 
etc). In the general framework, pattern recognition approaches 
receive a set of observations (input variables and their respective 
class labels) and estimate a decision rule or model that can be 
applied to new observations. Once the rule is learned, given the 
input variables of a new example the model returns its expected 
label. So, the pattern recognition framework can be described as a 
machine that makes a class prediction y (label) for some unseen 
input vector x (input variables). 
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The most popular pattern recognition approach used in neu- 
roimaging applications is the Support Vector Machine (SVM, 
Boser et al, 1992; Cortes and Vapnik, 1995; Vapnik, 1998). The 
SVM has received increasing interest due its attractive properties 
such as high generalization power (i.e., its ability to perform accu- 
rately on new, unseen examples after trained on a data set, which is 
usually called training data) and good scalability for high dimen- 
sional data, a particularly important property for neuroimaging 
applications. An additional important property that will be dis- 
cussed in following sections is the SVM ability to perform non- 
linear classification (when the rule to obtain class prediction is 
based on a non-linear function of the input variables). 

The fact that the neural substrates of some neurologi- 
cal/psychiatric disorders are very heterogeneous might limit the 
applications of two -class SVM to discriminate these patients' 
populations from healthy subjects. Individuals suffering from 
a pathological condition can exhibit different patterns of brain 
abnormalities (e.g., subtypes), which will affect the ability of pat- 
tern recognition approaches to find a reproducible discriminating 
pattern. 

NORMATIVE RULES AND METRICS 

When only one input variable is available (e.g., neuropsychological 
scale), an intuitive approach to define normative rules is finding a 
"normative" interval containing most of the typical observations. 
Thus, for an unseen observation, we may decide whether it is typ- 
ical or not by checking whether it is within the normative interval 
(Figure 1A). When a specific percentage is defined, for example 
95%, this interval is usually defined by the minimum length inter- 
val containing 95% of the typical observations (and thus, 5% of 
typical observations would be declared as atypical). When two 
input variables are considered, we may extend this approach to 
a bidimensional probability distribution, and define a normative 
boundary, which is the minimum area containing 95% of the typ- 
ical observations. In Figure IB, observations outside the red circle 
defining the boundary would be considered outliers. The same 
approach can also be extended to the case of three input vari- 
ables (Figure 1C), but it is easy to see that the problem becomes 
more difficult when the number of dimensions increases to tens 
or hundreds. In case of high dimensionality, one may be tempted 
to define normative rules independently for each variable (i.e., 
in a mass -univariate approach). However, if these variables are 
correlated (which is usually the case in neuroimaging data), even 
though a new observation is declared as typical by all "univari- 
ate" rules, it can be atypical from a multivariate perspective, i.e., 
when jointly taking into account all dimensions. In this situation, 
the most appropriate approach would be to define "multivariate" 
normative rules. 

When the number of input variables is small compared to the 
number of observations, it is possible to use the data to estimate a 
multivariate density function (Scott, 1992). Computing the proba- 
bility density for neuroimaging data might become intractable due 
to its extremely high dimensionality (equal to the number of input 
variables derived from an image, possibly the number of intracra- 
nial voxels) and the relatively small sample sizes usually available 
(number of subjects). Alternatively, one can use methods that only 
find a decision boundary and do not rely on density estimation 



such as the OC-SVM. In fact, most of the SVM theory was based 
on Vapnik's (1998) principle: "never to solve a problem which is 
more general than the one we actually need to solve." Thus, if our 
aim is determining the minimum region (interval, area, etc.) con- 
taining a fixed percentage of the typical data we do not need to 
estimate the full density function of the referred multidimensional 
variable. 

The challenge of defining normative multivariate rules is also 
known as data description problem in the machine-learning liter- 
ature (Fayyad et al, 1996; Tax and Laskov, 2003). This problem can 
be also framed as an outlier or novelty detection approach where 
the aim is to detect uncharacteristic observations. A possible appli- 
cation of data description is to classification problems where one 
of the groups (or classes) is representatively sampled (e.g., large 
random samples), while the other group is severely under sampled. 
Unbalanced classes are very common in clinical setting where data 
from healthy subjects can be easier or less expensive to obtain than 
data from patients. 

The OC-SVM computes a decision boundary with the minimal 
volume around a set of observations from the target group. Once 
the decision boundary is computed it can be used to classify new 
observations as outliers (if they fall outside the boundary) or not. 

UNDERSTANDING THE OC-SVM 
BASIC CONCEPTS 

One-class support vector machines is an unsupervised learning 
method proposed by Scholkopf et al. (2001). It focuses on deter- 
mining a boundary enclosing the typical observations considering 
only one class of observations for training (called target class). 
After the training, new observations can be classified either as typ- 
ical (in-class) or atypical (outliers), depending on their position 
in relation to the boundary. 

Assume we have a training data D composed of N observa- 
tions X[ (e.g., brain images) of dimension d (number of input 
variables), i.e., D = {x\ y . . ., x^} in M d . Our aim is to find the most 
compact region in R d containing most of the typical observations. 
The OC-SVM approach to solve this problem is to learn a mapping 
function from the input variables to a real number (f x : R d — >► R), 
such that most of data in D are mapped as positive values (the typ- 
ical class) byf x . In other words, they belong to the set R x = {xe R d 
withf x (x) > 0} while minimizing the volume of R x . This problem 
is called MVS (minimum volume set) estimation. 

The OC-SVM belongs to the class of kernel methods. Kernel 
methods make use of kernel functions to find relationships or 
patterns in the data, which can be used to take actions such as clas- 
sification (Shawe-Taylor and Cristianini, 2004). Kernel functions 
can be informally introduced as "similarity measures" to provide 
an intuitive understanding (Scholkopf and Smola, 2002). In the 
linear case this "similarity" can be expressed as the dot product 
between the input vectors representing the observations. A non- 
linear kernel corresponds to a mapping from the input variables 
(input space) to another space (feature space) using a non-linear 
transformation. A linear classification can then be carried out 
at this feature space and a linear boundary in the feature space 
corresponds to a non-linear boundary in the input space. 

Formally a kernel function can be defined as a function that 
given two observations x and x f eX satisfies k(x, x f ) = (0(x), 
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FIGURE 1 | Examples of boundary specification in the cases of one (A), two (B), and three (C) input variables. Illustration of OC-SVM concepts are 
depicted in (D). 



0(V))> where X is the input space or domain, 0 is a function 
mapping from X to a feature, and ( , > is the dot product. There- 
fore non-linear kernels can be used to compute dot products in 
feature spaces without explicitly mapping the observations into 
the spaces, property commonly known as kernel trick. 

A number of non-linear kernel functions have been proposed 
for kernel methods but the Gaussian or Radial Basis Function 
(RBF). Kernel is the most popular kernel function used with the 
OC-SVM. It depends on the Euclidian distance between the exam- 
ples and is defined as k(x, x r ) = e - "* - *'" / 2y . The parameter y 
can be thought as the distance used to measure the dissimilarity 
between the examples and it will influence the smoothness of the 
boundary (in the input space). This parameter is usually set using 
heuristics or tuned using cross-validation procedures (Scholkopf 
etal.,2001). 

In Figure ID we present an illustration of the v-OC-SVM 
solution based on the RBF kernel. The target data (solid black 
circles) is mapped (through the RBF kernel) from the input vari- 
ables space onto a hypersphere (in black). The problem then 
consists in finding the smaller hypersphere (in red) enclosing a 
pre-defined percentage of the observations [i.e., the (1 — v)]. Note 
that this optimization problem can be reformulated as finding 
the most distant hyperplane (in blue, defined by a weight vector 
w) from the origin, such that (1 — v) percent of the observa- 
tions will be separated from it. In other words, the OC-SVM 
algorithm returns a function / that takes a positive value in the 



minimum region capturing most of the training data (typical 
observations) points and a negative value elsewhere (atypical). The 
OC-SVM can also be viewed as a two-class classification problem 
where the target data is one of the classes and the origin is the 
other. 

In one-class classification problems, a false positive occurs 
when a true typical observation is erroneously classified as being 
atypical. For a given classifier, the probability of false positive mis- 
classification is named as the false positive rate. An issue to be 
addressed in OC-SVM is the choice of the false positive rate para- 
meters v. The parameter v can be fixed a priori and it corresponds 
to the percentage of observations of the typical data, which will be 
assigned as atypical (Type I Error). 

MEASURING HOW TYPICAL A SUBJECT IS 

The classification in OC-SVM is based on the outcome of the 
decision function (f x ), which in case of the RBF kernel is a non- 
linear function of the input variables. For a new observation, if this 
function value is positive, it means the observation is inside the 
boundary ("normative rule") defining the typical examples. Oth- 
erwise, if the function value is negative it means the observation 
is outside the boundary and therefore is classified as atypical. Fur- 
thermore the relative position from the boundary can be used as 
a measure of abnormality ("normative metric"), which can then 
be correlated with other clinical or psychological measures for 
validation. 
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OC-SVM in Clinical Research 



Predicting treatment response in Depression 



In Mourao-Miranda et al. (2011), patterns from brain 
activity to sad face stimuli were used to investigate 
whether depressed patients would be classified as 
outliers in relation to a boundary defined training a 
OC-SVM with healthy subjects patterns only. They 
found a significant correlation between OC-SVM 
predictions and HRSD (Hamilton Rate Scale for 
Depression) scores, showing that the more depressed 
the individuals, the more of an outlier they were 
predicted. In addition, most patients classified as non- 
outliers through the OC-SVM prediction responded 
well to the same treatment. 
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The figure (extracted from original article) illustrates 
test predictions after training OC-SVM in healthy 
subjects using features extracted from regions of 
interest. In graph (A) negative values characterize 
subjects predicted as outliers. In graph (B), test 
predictions are plotted against the HRSD score in 
week 0 (i.e. when the scans were acquired, before 
treatment). 
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Abnormal coherence patterns in ADHD 



Sato et al. (2012) used OC-SVM to obtain an 
abnormality index for each subject based on the 
patterns of functional connectivity modeled using 
spectral coherence. They showed that adults and 
children with ADHD present abnormal connectivity 
patterns between the dorsal portion of anterior 
cingulate gyrus and precuneus/posterior cingulate 
(PCC). 




The top-left graph in the figure (extracted from 
original article) shows the median coherence for 
ADHD patients, age-matched and younger typical 
subjects (dotted lines represent 75% quantiles). The 
box plots present the abnormality indices of pair wise 
comparisons between groups, showing that ADHD 
patients differ significantly from the TD-matched 
group and resembles the group of younger subjects 
with typical development. 



FIGURE 2 | In this figure-box we illustrate two applications of 
OC-SVM in studies involving different neuropsychological 
conditions: major depression disorder and attention-deficit/ 
hyperactivity disorder (ADHD), both addressing the question of 
defining a boundary characterizing distributions of brain activation 
patterns from a normal population. These papers are among the 
pioneer works representing important proof of concept that shows the 



potential of applying OC-SVM classifiers in order to obtain biomarkers 
for diagnosis or even prognosis in neuropsychological conditions. 
Although there are still few applications to date, this approach seems to 
be one of the trends in neuroimaging methods for clinical research with 
a high potential to be used in clinical routine in the near future. The 
figures were adapted and reproduced with the authorization of the 
original publishers. 
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CURRENT APPLICATIONS IN NEUROIMAGING 

There have been a few applications of the OC-SVM to non-clinical 
fMRI data. Hardoon and Manevitz (2005) applied OC-SVM to 
learn the pattern of brain activity associated with a motor task. In 
a proof of concept paper, Sato et al. (2009) applied the method 
to motor networks based on functional connectivity estimation in 
order to construct a normative connectivity database. This study 
has shown that subjects identified as outliers were scored at the 
tails of the distribution of a laterality index (Edinburg Inventory). 
Song and Wyrwicz (2009) applied the OC-SVM as an approach 
to classify voxels as activated or non-activated. According to the 
authors this framework can provide robust and accurate mapping 
of functional activation. 

Regarding clinical applications (see Figure 2), a recent work has 
demonstrated possible advantages of using OC-SVM with respect 
to two-class SVM in classifying depressed patients vs. healthy con- 
trols based on fMRI data. Fu et al. (2008) applied a standard 
two-class SVM to classify healthy controls vs. unipolar depressed 
patients based on the whole brain patterns of activation to an 
emotional stimulus (sad faces). However the authors were not 
able to predict treatment response with a significant accuracy as 
the number of responders and non-responders patients were not 
enough to train a two -class SVM. The same data set was later re- 
analyzed using the OC-SVM by Mourao -Miranda et al. (2011). 
In this study the authors found a significant correlation between 
the OC-SVM predictions and the patients' Hamilton Rating Scale 
for Depression, i.e., the more depressed the patients were the 
higher their abnormality score was. Furthermore, the OC-SVM 
split the patient group into two subgroups whose memberships 
were associated with future response to treatment. This example 
illustrates the potential added value of using OC-SVM framework 
with respect to the two-class classification. 

A more recent application of OC-SVM to neuroimaging data 
has been to investigate attention-deficit hyperactivity disorder 
(ADHD, Sato et al., 2012; see Figure 2). In this work, the authors 
have applied OC-SVM to patterns of functional connectivity and 
demonstrated that adults and children with ADHD present abnor- 
mal connectivity. An interesting conclusion was that the connec- 
tivity patterns of ADHD patients were more similar to the ones 
of younger typical developments subjects, in agreement with the 
hypothesis that the disorder is associated with abnormal brain 
maturation. 

TECHNOLOGICAL CHALLENGES 

It should be emphasized that the OC-SVM tries to solve a less 
constrained and therefore more difficult problem than the stan- 
dard two-class SVM. In conventional two-class classification, in 
which the data from the two classes are available, the decision 
boundary is supported from both sides by training examples. In 
one-class classification only one class of data is available, and thus 
the boundary is only supported from one side. One of the chal- 
lenges is to decide how tightly the boundary should fit in each of 
the dimensions around the data (defined by the kernel parame- 
ter). The curse of dimensionality becomes more severe in one-class 
classification problems as the boundary has to be defined in all 
directions. It is therefore expected that the one-class classification 
will require larger sample size for training in comparison with 
conventional classification. 



Another obstacle, inherent to the unsupervised nature of OC- 
SVM, is the high susceptibility to noise and uninformative vari- 
ables. OC-SVM attempts to take into account the entire set of input 
variables to learn common patterns. A direct consequence of this 
limitation is that the input variables should be carefully chosen 
when applying OC-SVM. Therefore priori knowledge about the 
relevance of input variables is necessary to increase the sensitivity 
of the method. 

Finally, it is important to mention that the measure of abnor- 
mality obtained using OC-SVM is not specific, once this index 
reports deviation from normality in any direction. Thus, the label- 
ing as an "outlier" does not provide any qualitative information 
about the abnormality, suggesting that further exploration of the 
data must be carried out. 

FUTURE DIRECTIONS 

Considering the increase of large multi-center neuroimaging data- 
bases available we foresee also an increase on the applications of 
one-class classification approaches for defining normative rules 
based on these data. There are already some initiatives aiming to 
build large-scale neuroimaging databases such as the 1000 Con- 
nectomes (Biswal et al, 2010), ADNI (Mueller et al, 2005), and the 
ADHD-200 Sample 1 . In addition, there are some interdisciplinary 
and collaborative efforts to provide preprocessed data for both 
functional and structural neuroimaging 2 . The relevance of these 
initiatives in clinical applications is evidenced by the recent boom 
of studies using these shared databases (Babiloni et al., 2012; Meier 
et al., 2012). Public available databases are very attractive resources 
for the application of pattern recognition, machine learning, and 
data-mining methods as they normally consist of large number of 
subjects enabling the development and test of different models. 

As we previously discussed, one of the main obstacles when 
dealing with structural/functional normative databases is how 
to define a "typical" brain pattern using high dimensional data. 
The high inter- subject variability and distinct etiologies of most 
neurological and neuropsychiatric conditions make the problem 
extremely difficult to tackle. The OC-SVM is a promising tool that 
can be applied to explore and extract relevant characteristics and 
information from these large databases with the aim of building 
normative rules. As in other medical areas, a cut-off value can be 
determined by locating a subject score in relation to the distribu- 
tion derived from a number of healthy subjects. The fact that the 
OC-SVM approach can rely entirely on data from a healthy control 
sample makes it particularly suitable for the identification of rare 
disorders when only data from a very small number of patients are 
available or for the identification of subgroups of patients. 
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